An interactive account of bilingual lexical acquisition
Word acquisition starts around 6 months of age (Jusczyk and Aslin 1995; Tincoff and Jusczyk 1999; Bergelson and Swingley 2012)
Word-learning involves the (challenging) task of associating a word-form to its referential context (ambiguous, variable)
Bilinguals face the challenge of learning more than one word-form per referent
Bilinguals keep up with their monolinguals: how?
Vocabulary checklist: number/proportion of words checked by caregivers as Understands, and/or Says
| Understands | Understands & Says | |
|---|---|---|
| chair | [ ] | [ ] |
| table | [ ] | [ ] |
| … | [ ] | [ ] |
English-Spanish bilinguals have smaller English vocabulary sizes, compared to monolinguals, but similar vocabulary sizes when both language are summer together (Hoff et al. 2012)
Bilingual toddlers learning two typologically close languages showed larger vocabulary sizes (Floccia et al. 2018)
Cognate: form-similar translation equivalents (TEs)
| Cognate | Non-cognate |
|---|---|
| [cat] /ˈgat-ˈgato/ | [dog] /ˈgos-ˈpe.ro/ |
Bilinguals acquire TEs from early steps of vocabulary growth (Bilson et al. 2015; Tsui et al. 2022)
Cognateness facilitates vocabulary growth? Mechanisms?
Lexical access is language non-selective:
Translation equivalents are co-activated, even in monolingual situations
Cognates are acquired earlier than non-cognates (Mitchell, Tsui, and Byers-Heinlein 2022; Bosch and Ramon-Casas 2014)
Dissociation between models of bilingual word processing (parallel activation) and word acquisition
:::
Word acquisition as a continuous process of lexical consolidation (Hidaka 2013; Mollica and Piantadosi 2017)
{width=14in, fig-align:center}
For participant \(i\) and word \(j\):
\[ \begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \end{aligned} \]
\[ \begin{aligned} \text{Age of acquisition}_{ij} &= \text{minimize}(|\text{Threshold}_{ij}-\text{Learning instances}_{ij}|) \end{aligned} \]
We fix some parameters:
\[ \begin{aligned} \text{Threshold} &= 250 \\ \lambda &= 50 \end{aligned} \]
| Catalan | Spanish |
|---|---|
| 100% | 0% |
\[ \begin{aligned} \text{Learning instances}_{ij} &= Age_i \cdot Frequency_j + \\ &(Similarity_j \cdot \text{Learning instances}_{ij'}) \end{aligned} \]
| Catalan | Spanish |
|---|---|
| 60% | 40% |
| Catalan | Spanish |
|---|---|
| 75% | 25% |
On-line (formr, Arslan, Walther, and Tata 2020), inspired in MacArthur-Bates CDI (Fenson et al. 1994)
~1,600 items/words (800 Catalan + 800 Spanish)
Participants filled one of four versions of the questionnaire:
500 items: 250 Catalan + 250 Spanish
Short-listed (nouns): 302 translation equivalents (TE)
138,078 item responses from 366 participants
| 1 time | 2 times | 3 times | 4 times |
|---|---|---|---|
| 312 | 42 | 8 | 4 |
Ordinal regression model: \(P(Understands)\), \(P(Says)\)
Multilevel: Crossed-random effects
Bayesian: probability of parameter values
\[P(\text{model} | \text{data}) \propto P(\text{data} | \text{model}) \times P(\text{model})\]
| Predictor | Example |
|---|---|
| Age | Months |
| Length | Number of phonemes |
| Exposure | Lexical frequency \(\times\) Language exposure |
| Cognateness | Levenshtein similarity between a word-form and its translation |
| Two-way and three-way interactions between age, exposure, and cognateness |
| Predictor | Estimate | 95% HDI | p(H0) |
|---|---|---|---|
| Intercepts | |||
| Comprehension and Production | 0.438 | [-0.5, 0.5] | 0.088 |
| Comprehension | 0.936 | [2.44, 0.95] | 0.000 |
| Slopes | |||
| Age (+1 SD, 4.87, months) | 0.405 | [1.43, 0.45] | 0.000 |
| Exposure (+1 SD, 1.81) | 0.233 | [0.8, 0.27] | 0.000 |
| Cognateness (+1 SD, 25.65%) | 0.058 | [0.06, 0.1] | 0.037 |
| Length (+1 SD, 1.56 phonemes) | -0.062 | [-0.35, -0.04] | 0.000 |
| Age × Exposure | 0.071 | [0.16, 0.1] | 0.000 |
| Age × Cognateness | 0.014 | [0, 0.03] | 0.985 |
| Exposure × Cognateness | -0.057 | [-0.28, -0.05] | 0.000 |
| Age × Exposure × Cognateness | -0.018 | [-0.11, -0.01] | 0.975 |
Cognateness facilitates word acquisition
Only low-exposure words benefit from their cognate status: less dominant language receives more facilitation
Parallel activation as mechanism that boosts lexical consolidation: increment in cumulative learning instances
Catalan-Spanish: very specific population
Next steps: word-learning, formalisation
Levenshtein distance: number of edits for two character strings to become identical
| Orthography | Phonology | String | |
|---|---|---|---|
| Catalan | porta | /ˈpɔɾ.tə/ | pɔɾtə |
| Spanish | puerta | /ˈpweɾ.ta/ | pweɾta |
\[
1-\frac{lev(A, B)}{Max(length(A), length(B))}
\]
| Catalan | Spanish | Levenshtein |
|---|---|---|
| porta (/ˈpɔɾ.tə/) | puerta (/ˈpweɾ.ta/) | 0.50 (3) |
| taula (/ˈtaw.lə/) | mesa (/ˈmesa/) | 0.00 (5) |
| cotxe (/ˈkɔ.t͡ʃə/) | coche (/ˈkot͡ʃe/) | 0.40 (3) |
| … | … | … |
International Symposium of Psycholinguistics | Vitoria, 31st May, 2023